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This work is about the use of regularized optimal-transport distances for 
convex, histogram-based image segmentation. In the considered framework, 
fixed exemplar histograms define a prior on the statistical features of the 
two regions in competition. In this paper, we investigate the use of various 
transport-based cost functions as discrepancy measures and rely on a primal- 
dual algorithm to solve the obtained convex optimization problem. 
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1. Introduction 

Optimal transport Optimal transport theory has received a lot of attention during 
the last decade as it provides a powerful framework to address problems which embed 
statistical constraints. Its successful application in various image processing tasks has 
demonstrated its practical interest (see e.g. [10, 11, 13, 7, 8]). Some limitations have been 
also shown and partially addressed, such as time complexity, regularity and relaxation [3, 
6 ]- 

Segmentation Statistical based image segmentation has been thoroughly studied in 
the literature, first using parametric models (such as the mean and variance), and 
then empirical distributions combined with adapted statistical distances, such as the 

*A short version of this work has been published in the proceedings of SSVM’15. 
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Kullback-Leibler divergence. In this work, we are interested in the use of the optimal 
transport framework for Image segmentation. This has been first investigated in [10] for 
ID features, then extended to multi-dimensional features using approximations of the 
optimal transport cost [7, 12], and adapted to region-based active contour in [12], relying 
on a non-convex formulation. In [14], a convex formulation is proposed, making use of 
sub-iterations to compute the proximity operator of the Wasserstein distance, which use 
is restricted to low dimensions. 

In this paper, we extend the convex formulation for two-phase image segmentation 
of [17] for non-regularized as well as regularized [3, 4] optimal transport distances. This 
work shares some common features with the recent work of [5] in which the authors 
investigate the use of the Legendre-Fenchel transform of regularized transport cost for 
imaging problems. 

2. Convex histogram-based image segmentation 

2.1. Notation 

We consider here vector spaces equipped with the scalar product (., .) and the norm 
INI = V ’ •)• The conjugate operator of A is denoted by A* and satisfies (Ax,y) = 
(. x,A*y ). We denote as l n and 0 n G W 1 the n-dimensional vectors full of ones and zeros 
respectively, x T the transpose of x, and V the discrete gradient operator, while Id stands 
for the identity operator. The operator diag(x) defines a square matrix whose diagonal 
is x. Functions is and Is are respectively the characteristic and indicator functions of 
a set S. Proj and Prox stands respectively for the Euclidean projection and proximity 
operator. The set Sk, n '= {% £ K+, (x, l n ) = k} is the simplex of histogram vectors 
being therefore the discrete probability simplex of W 1 ). 

2.2. General formulation of distribution-based image segmentation 

Let I : x G D i-G I{x) G M. d be a color image, defined over the Wpixel domain D 
(N = |D|), and T a feature-transform of n-dimensional descriptors PI(x) G W 1 . We 
would like to define a binary segmentation u : (1 G {0,1} of the whole image domain, 
using two fixed probability distributions of features a and b. Following the variational 
model introduced in [17], we consider the energy 

J(u) = pTV{u) + D(a, h(u )) + D(b, h( 1 - u)) (1) 

where p > 0 is the regularization parameter, 

• the fidelity terms are defined using D(.,.), a dissimilarity measure between features; 

• h(u) is the empirical discrete probability distribution of features PI using the 
binary map u , which is written as a sum of Dirac masses 

h{u) :i/GK n G u(x)Sjr^ x )(y) ; 
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• TV{u ) is the total variation norm of the binary image u , which is related to the 
perimeter of the region R\{u) := {x G Tt \ u(x) = 1} (co-area formula). 

Observe that this energy is highly non-convex since h is a non linear operator, and that 
we would like to find a minimum over the non-convex set {0,1}^. 

2.3. Convex relaxation of histogram-based segmentation energy 

The authors of [17] propose some relaxations and a reformulation in order to handle 
the minimization of energy (1) using convex optimization tools. 

2.3.1. Probability map 

The first relaxation consists in using a segmentation variable ^ : O i—^ [0,1] which is a 
weight function (probability map). A threshold is therefore required to get a binary seg¬ 
mentation of the image into two regions Rt(u) := {x G Tt \ u(x) > t} and its complement 
Rt(u) c . 

2.3.2. Feature histogram 

The feature histogram of the probability map is denoted Hx(u) and defined as the 
quantized, non-normalized, and weighted histogram of the feature image TI 
using the relaxed variable x/ : i—[0,1] and a feature set X — {Xi G K n }i<i<M x 
composed of Mx bins 

(H x (u)) l = u(x)l Cx{i) (FI(x)), Vi € {1,.. . Mx] 
xen 

where i a bin index, Xi is the centroid of the corresponding bin, and Cx(i) C is 
the corresponding set of features (e.g. the Voronoi cell obtained from hard assignment 
method). Therefore, we can write Hx as a linear operator 

Hx : u G R N i-g lx • u G M Mx , with 1 x(hj) ~ 1 if ^l(j) ^ ^x(i), 0 otherwise. 

Note that lx E H MxxN is a fixed assignment matrix that indicates which pixels of 
XI contribute to each bin i of the histogram. As a consequence, (Hx{u) , lx) = 
J2xen u ( x ) = (“> 1 n), so that H x (u) € S Mx ^ U)1) . 

2.3.3. Exemplar histograms 

The segmentation is driven from two fixed histograms a G i and b G <Sm 6 ,i, which 

are normalized [i.e. sum to 1), have respective dimension M a and M^, and are obtained 
using the respective sets of features A and B. In order to measure the similarity between 
the non-normalized histogram Hj^u and the normalized histogram a, while obtaining a 
convex formulation, we follow [17] and consider the fidelity term D (a(u, In), Haul), 
where the constant vector a has been scaled to Hj^u £ ^M a ,(u,l)- 
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2.3.4. Segmentation energy 

Observe that the problem can now be written as finding the minimum of the following 
energy 

E(u) = pTV(u) + ( a(u , 1 N ),H A u) + (b(l N - u, 1 N ),H B (1 N - u )). 

The constant 7 G (0, N ) is meant to compensate for the fact that the binary regions 
Rt(u) and Rt{u) c may have different size. More precisely, as we are interested in a 
discrete probability segmentation map, we consider the following constrained problem: 

min E(u) = min \E(u) := E(u) + t [ 0 i i]iv(u)} . 
ue[o,ip ueR N t L ’ J j 

2.3.5. Simplification 

From now on, and without loss of generality, we will assume that all histograms are 
computed using the same set of features, namely A — B. We will also omit unnecessary 
subscripts in order to simplify notation. Moreover, we also omit the parameter 7 since its 
value seems not to be critical in practice, as demonstrated in [17]. Finally, introducing 
linear operators 

A := aljf € R m ' n and B := bl T N G R M ' N ( 2 ) 

such that Au = (al 7 u = a(u, 1 jv), and the usual discrete definition of total variation 

N 

TF(n) = t|Vn|| 1)2 = ^||Vn(a;)|| 2 = ^ 

2=1 

we have the following minimization problem: 

min p\\Vu\\ + D(Au , Hu) + D(B( 1 — u),H( 1 — u)) + (3) 

Notice that matrix H G R M Ar is sparse (with N non zero values) and A and B are of 
rank 1 , so that storing or manipulating these matrices is not an issue. 

In [17], the distance function D was defined as the L\ norm. In the following sections, 
we investigate the use of similarity measure based on optimal transport, which is known 
to be more robust and appropriate for histogram comparison. The next paragraph details 
the optimization framework used in this work. 



2.4. Optimization 

In order to solve (3), we consider the following dualization of the problem using the 
Legendre-Fenchel transforms of the l? norm and the function D 

min max (Hu,p A ) + (Au,q A ) + (H( 1 -u),p B ) + (B( 1 -u),q B ) + (Vu,p c ) 

(4) 

+ doppM - d *(Pai Qa) - d *(Pb,Qb) - H-Kpfrc)’ 
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where L\\.\\^p is the characteristic function of the convex £2 ball of radius p, while D* is 
the dual of the function D. In order to accommodate the different models studied in this 
paper, we assume here that D* is a sum of two convex functions D* = D* + D 2 , where 
D\ is non-smooth and D?> is differentiable and has a Lipschitz continuous gradient. 

We recover a general primal-dual problem of the form 

min max (Ku,p) + q 0 iiiv(?i) + H{u) — F*(p) — G*(p), (5) 

up L ’ J 

with primal variable u G and dual vector p = [p^, q^p 1 ^, QbiPc\ T ^ R 4M + 2Ar ? where 

• K = [iL T , A T , ~H T , —B T , V T ] T G ]]j( 4 M + 2 A 0 xAr i s a sparse, linear operator; 

• H is convex and smooth ( H(u) = 0 in the setting of problem (5)) with Lipschitz 
continuous gradient ViL with constant Lh ; 

• (^) i s convex and non-smooth; 

• F*(p) — D*(p A , c^) + D*(p£, q B ) + ^||.||^p(p^) is convex and non-smooth; 

• G*(p) = D* 2 (p A ,q A ) + D 2 (p B , q B ) - {H1 N , p B ) - {B 1 N , q B ) is convex and differ- 
entiable with Lipschitz constant 

To solve this problem, we consider the primal dual algorithm of [16, 1] 
f u k + l = Proj [01 ]jv ( u k - r(K T p k + VH(u k ))) 

\ p k+1 — ProX(j p 1 * ( p k + cj{K{2u k+1 — u k ) — VG*(p fc ))) v 

that converges to a saddle point of (5) as soon as (see for instance [1, Eq. 20]) 

£-L H ){];-L G *)>\\K\\ 2 . (7) 

Note this method may benefit from the recent framework proposed in [9], using inertial 
acceleration and pre-conditionning. 


3. Monge-Kantorovitch distance for image segmentation 

3.1. Wasserstein Distance and Optimal Transport problem 

3.1.1. Optimal Transport problem 

We consider in this work the discrete formulation of the Monge-Kantorovitch opti¬ 
mal mass transportation problem (see e.g. [15]) between a pair of histograms a G SM a ,k 
and b G <Sm 6 ,/c• Given a fixed assignment cost matrix G between the 

corresponding histogram centroids A — {^4.i}i<i<M a and B = an opti¬ 

mal transport plan minimizes the global transport cost, defined as a weighted sum of 
assignments 

( M a M h ) 

<^ = E£^ • (s) 

“ t Z"l 
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The sets of admissible histogram and transport matrices are respectively 

S := {a G R Ma ,b G R Mb \ a > 0, b > 0 and (a, 1 M a ) = (&, 1 m 6 )}, (9) 

P(a,6) := {.P G Mf° xM6 ,Pl Mb = a and P T l Ma = b}. (10) 

Observe that the norm of histograms is not prescribed in <S, and that we only consider 
histograms with positive entries since null entries do not play any role. 

3.1.2. Wasserstein distance 

When using Cij = \\A{ — Bj\\ p , then W p (a, 6) = MK(a, b) l / p is a metric between 
normalized histograms. In the general case where C does not verify such a condition, 
by a slight abuse of terminology we will refer to the MK transport cost function as the 
Monge-Kantorovich distance. 

3.1.3. Monge-Kantorovich distance 

In the following, due to the use of duality, it would be more convenient to introduce 
the following reformulation: 

Va,6 MK(a, b) = min (P, C) + qs(a, b). (11) 

PeV(a,b) 


3.1.4. LP formulation 

We can rewrite the optimal transport problem as a linear program (LP) with vector 
variables. The primal and dual problems write 


MK(a) 


mm 


(c, p) + is (a) 


s.t. p> 0, L 1 p =a 


— max (cp /?). 
/3gM m « +m 6 
s.t. L/3<c 


( 12 ) 


where a is the concatenation of histograms: a T — [a T ,b T ] and the unknown vector 
p G corresponds to the bi-stochastic matrix P being read column-wise (i.e. 

Pij = Pi+(j-i).M a )- + Mb linear marginal constraints on p are defined by the 

matrix L T G M a+M b )x(M a M b ) through equation L T p = ct, where 


L t 


1 M b ^\ 1 M h &2 
Id M b Id M b 


^M b Z T M a 

Id M b 


with ei(j) = 5i-j V j < M b . 


Note that we have the following property: ( La)ij — (L^])^ . = ai + bj. 

The dual formulation shows that the function MK(a) is not strictly convex in a. We 
draw the reader’s attention to the fact that the indicator of set S is not required anymore 
with the dual formulation, which will later come in handy. 
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3.1.5. Dual distance 

From Eq. (12), we have that the Legendre-Fenchel conjugate of MK writes simply as 
the characteristic function of the set C c := {/3 | L/3 — c < 0} 

MK*(P) = i Lf)<c (P). (13) 

3.2. Integration in the segmentation framework 

We propose to substitute in problem (3) the dissimilarity functions by the convex 
Monge-Kantorovich optimal transport cost (11). 

In order to apply our minimization scheme described in (6), we should be able to 
compute the proximity operator of MK*, which is the projection onto the convex set 
C c - However, because the linear operator L is not invertible, we cannot compute this 
projector in a closed form and an optimization problem should be solved at each iteration 
of the process (6) as in [14]. 


3.2.1. Bidualization 

To circumvent this problem, we resort to a bidualization to rewrite the MK distance 
as a primal-dual problem. First, we have that MK*(/3) = f*(L/3 ) with /*(r) = L r < c (r), 
so that f(r) = (r, c) + t r >o(r). Then, 


MK*(/3) /*(L/3) max (r, Lf3) — f(r ) max (r, L/3 — c) — L.>o{r) 


MK(a) = max (<a, (3) — /*(L/3) = max (<a, f3) + min(r, c — L/3) + l.> o(r) 

3 3 r ~ 

= min max (r, c) + £.>o(r) + (a — L T r , f3). 

r (3 


(14) 


3.2.2. Segmentation problem 

Plugging the previous expression into Eq. (4) enables us to solve it using algorithm (6). 
Indeed, introducing new primal variables va^b Cl M m related to transport mapping, 
we recover the following primal dual problem 


min max 

ueR N Pa^a^Pb^b^ rM 
r A ,r B eR M2 P C eR 2N 


(Hu,p A ) + ( Au,q A ) + (H( 1 - u),p B ) + (B( 1 -u),q B ) 


(r A , c- L 


PA 

QA 


) + (r B , c- L 


Pb 

qB 


) + (Vu,p c ) 


+ doap (u) + ^->o (ta) + i.>o{r B ) - q|.|Kp(Pc)- 


(15) 


Observe that now we have a linear term H(u , r A , r B ) — (r A + re, c) whose gradient has 
a Lipschitz constant Lh — 0. We have also gained extra non smooth characteristic func¬ 
tions l.> o, whose proximity operators are trivial (projection onto the positive quadrant 
M^ 2 ; prox 6>0 (x) = max{0 ,x}). 
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3.2.3. Advantages and drawback 

The main advantage of this new segmentation framework is that it makes use of op¬ 
timal transport to compare histograms of features, without sub-iterative routines such 
as solving optimal transport problems to compute sub-gradients or proximity operators 
(see for instance [3, 14]), or without making use of approximation (such as the Sliced- 
Wasserstein distance [12], generalized cumulative histograms [11] or entropy-based regu¬ 
larization [4]). Last, the proposed framework is not restricted to Wasserstein distances, 
since it enables the use of any cost matrix, and does not depend on features dimension¬ 
ality. 

However, a major drawback of this method is that it requires two additional primal 
variables ta and tb whose dimension is M 2 in our simplified setting, M being the 
dimension of histograms involved in the model. As soon as M 2 N, the number of 
pixels, the proposed method could be significantly slower than when using L 1 as in [17] 
due to time complexity and memory limitation. This is more likely to happen when 
considering high dimensional features, such as patches or computer vision descriptors, 
as M increases with feature dimension n. 


4. Regularized MK distance for image segmentation 

As already mentioned in the last section, the previous approach based on optimal 
transport may be very slow for large histograms. In such a case, we propose to use 
instead the entropy smoothing of optimal transport recently proposed and investigated 
in [3, 4, 5], that may offer increased robustness to outliers [3]. While it has been initially 
studied for probability simplex S i, we here investigate its use for our framework with 
unnormalized histograms on S. 

4.1. Sinkhorn distances MK A 

The entropy-regularized optimal transport problem ( 11 ) on set S (Eq. (9)) is 

MK A (a,6) := min {(P, C) - jh(P)} + i s (a, b), ( 16 ) 

-r(zV(a,b) v ' 

where the entropy of the matrix P is defined as h(P ) := — (P, logP). Thanks to the 
negative entropy term which is strictly convex, the regularized optimal transport problem 
has a unique minimizer, denoted which can be recovered using a fixed point algorithm 
studied by Sinkhorn (see e.g. [3]). The regularized transport cost MK^(a, b) is thus 
referred to as the Sinkhorn distance. 

4.1.1. Interpretation 

Another way to express the negative entropic term is: 

— h(p ) : p G KL(p||lfc) G R, with k — M a • M & 
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that is the Kullback-Leibler divergence between transport map p and the uniform map¬ 
ping. This shows that, as A decreases, the model encourages smooth, uniform transport 
so that the mass is spread everywhere. This also explains why this distance shows bet¬ 
ter robustness to outliers, as reported in [3]. To conclude, one thus would like to use in 
practice large values of A to be close to the original Monge-Kantorovich distance, but 
low enough to deal with feature perturbation. 

4.1.2. Structure of the solution 

First, the Sinkhorn distance (16) reads as 

MK A (o) := min (p, c + j log p) + Ls(cf). 

pe RM a -M b A (17) 

s.t. p>0, L T p =a 

As demonstrated in [3], when writing the Lagrangian of this problem with a multiplier (3 
to take into account the constraint L T p = a, we can show that the respective solutions 
p* x and Pf of problem (16) and (17) write 

7 / 

logf>A = K L P - c) - 1^ (log Px)i,j = K u i + Vi- Cij) - 1 with 0 = v . 

Remark 1 . The constant —1 is due to the fact that we use the unnormalized KL diver¬ 
gence KL(p\\lk), instead of KL(p |||l&) for instance. 

4.1.3. Sinkhorn algorithm 

Sinkhorn showed that the alternate normalization of rows and columns of any positive 
matrix M converges to a unique bistochastic matrix P = diag(x)M diag(y). The fol¬ 
lowing fixed-point iteration algorithm can thus be used to find the solution Pf: setting 
M\ — e~ xc , one has 

P\ = dmg(x°°)M\ diag(y°°) where x k+1 = f fc and y k+1 = ^ , 

Mf x K 

where a and b are the desired marginals of the matrix. This result enables us to design 
fast algorithms to compute the regularized optimal transport plan, and the the Sinkhorn 
distance or its derivative, as demonstrated in [3, 4]. 

4.2. Legendre-Fenchel transformation of Sinkhorn distance MK A 

Now, in order to use the Sinkhorn distance in algorithm (6), we need to compute its 
Legendre-Fenchel transform, which has been expressed in [4]. 

Proposition 1 (Cuturi-Doucet). The convex conjugate of MK\(a) reads 

MK* x ((3) = \ (Q a (/3), 1 ) with Q\(/3) := (18) 
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We obtain a simple expression of the Legendre-Fenchel transform which is (7°°, but 
unfortunately, its gradient is not Lipschitz continuous. 

To overcome this problem, we propose two solutions in the next paragraphs: either we 
use a new normalized Sinkhorn distance (§ 4.3), whose gradient is Lipschitz continuous 
(§ 4.4), or we rely on the use of proximity operator (§ 4.6). 


4.3. Normalized Sinkhorn distance MK A <w on S<n 

As the set S of admissible histograms does not prescribe the sum of histograms, we 
consider here a different setting in which the histograms’ total mass are bounded above 
by TV, the number of pixels of the image domain ft 


S< N := { a(ER Ma ,beR Mb 


a > 0,6 > 0, (a, l Ma ) 


(' b , 1 M b ) < iv} • 


(19) 


Moreover, as the transport matrix Pf is not normalized (i.e. (Pf, 1) < TV), we also 
propose to use a slightly normalized variant of the entropic regularization: 


hip) ■= Nh (#) = -N KL (#||1) = -(p, log p) + (p, 1) log TV. 
Corollary 1. The convex conjugate of the normalized Sinkhorn distance 


mm 

peR M a .M h 
r.T<r 


{(p. C+ jlogp- ^1)} + <■$<* («) 


MK\<:^(a) := 

s.t. p> 0 , L 1 p=a 

reads, using the matrix-valued function Q\{.) ^ e A(£--c)-i defined in (18) 

' T<Qa(/3),1) if(Qx(P), 1)<1 


mk* xsn {P) = 


Proof. See appendix A.l. 


flog(Q A (/3), 1) + f if(Q\(l 3), 1)^1 


N 


( 20 ) 

( 21 ) 


( 22 ) 


□ 


Observe that the dual function MK^ <iV (/3) is continuous for (Q A (/?*), 1) = 1. Note 
also that the optimal matrix now is written Pf — NQ\(/3 *) if ( Q\(/3 *), 1) < 1, and 

p \ = N (Swrr) otherwise - 


4.4. Gradient of MK^ <iV 

From Corollary 1, we can express the gradient of MK A <N which is continuous (writing 
Q in place of Q\(/3) to simplify expression) 


VMK A; < Ar (/3) 


N (Q1m &5 ^M a Q) if (Qj 1) ^ 1 

(Q, 1 ) {Q^M h , ^M a Q) if (Qj 1 ) ^ 1 


(23) 


We emphasis here that we retrieve a similar expression than the one originatively demon¬ 
strated in [5], where the authors consider the Sinkhorn distance on the probability sim¬ 
plex Si (i.e. the special case where TV — 1 and (Q, 1) = 1). 
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Proposition 2 . The gradient V7 MK* X <N is a Lipschitz continuous function of constant 
Lmk* bounded by 2XN. 

Proof. See appendix A. 2 . □ 

4.5. Optimization using V MK X<N 

The general final problem we want to solve can be expressed as: 

min p TV(u) + MKa,<a r(H a u, Au ) + MKa,<]v(#&(1 — u),B( 1 — u)) + Oo,ip i u )- (24) 

U — — L J 


Using the Legendre-Fenchel transform, the problem (24) can be reformulated as: 
min max (H a u,p A ) + (Au, q A ) + (H b (l - u),p B ) + (B(l - u), q B ) + (Vu,p c ) 

u Pa’Qa 
PB iQb iPC 

+ — M.K X ^ n (Pai Qa) ^K\,^n(Pbi Qb) ~ L \\.\\^p{Pc)i 

and can be optimized with the algorithm ( 6 ). Using proposition 2, VG* is a Lipschitz 
continuous function with constant Lq * checking Lq* = 2L MK * + \\HbW + ||£>|| = 2XN + 
\\HbW + ||5||, where N is the number of pixels. It will be large for high resolution images 
and huge for good approximations of the MK cost (i.e. X 1). Such a scheme may 
thus involve a very slow explicit gradient ascent in the dual update ( 6 ). In such a case, 
we can resort to the alternative scheme proposed in the next subsection. 

4.6. Optimization using proximity operator of MKa* 

An alternative optimization of (24) consists in using the proximity operator of MKJ. 
Since we cannot compute the proximity operator of MK^ in a closed form, we resort 
instead to a bidualization, as previously done in Section 3.2. 

Considering now the normalized function MKa (a) using entropy normalization (20) 
on set 5, we thus have MK^(/5) = y (Qa(/5), 1) = g x (L/3). 

Proposition 3. The proximity operators of g x (q) = y (e A ( g-c ) -1 , 1) is 


where W is the Lambert function, such that w = W(z) is solution of we w = z. The 
solution is unique as z — XrNeK p- 0 )- 1 ^ 0 . 

Proof. See appendix A.3. □ 

Remark 2 . Note that the Lambert function can be evaluated very fast. 


prox , 


' 9 \ 


(p)=p-j-W (AriVe A(p - 


(25) 
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5. Experiments 

Experimental setting In this experimental section, exemplar regions are defined by 
the user with scribbles (see Figures 1 to 6 ). These regions are only used to built prior 
histograms, so erroneous labeling is tolerated. Histograms a and b are built using hard- 
assignment on M = 8 n clusters, which are obtained with the K-means algorithm. 

We use either RGB color features (J 7 = Id and n = d = 3) or the gradient norm of 
color features (J 7 = ||V.|| and again n = d = 3). The cost matrix is defined from the 
Euclidean metric ||-|| in space, combined with the concave function 1 — which 

is known to be more robust to outliers. Region Rt(u ) is obtained with threshold t = 
as illustrated in Figure 1 . Approximately 1 minute is required to run 500 iterations and 
segment a 1 Megapixel color image. 


Results Figure 1 shows the influence of the threshold t used to get a binary segmen¬ 
tation. A small comparison with the model of [17] is then given in Figures 2 and 3. 
This underlines the robustness of optimal transport distance with respect to bin-to-bin 
L 1 distance. Contrary to optimal transport, when a color is not present in the refer¬ 
ence histograms, the L 1 distance does not take into account the color distance between 
bins which can lead to incorrect segmentation. The robustness is further illustrated in 
Figure 5. It is indeed possible to use a prior histogram from a different image, even 
with a different clustering of the feature space. Note that it is not possible with a bin- 
to-bin metric, which requires the same clustering. Figure 4 shows comparisons between 
the non-regularized model, quite fast but high dimensional model, with the regularized 
model, using a low dimensional formulation. One can see that setting a large value of A 
gives interesting results. On the other hand, using a very small value of A always yields 
poor segmentation results. 

Some last examples on texture segmentation are presented in Figure 6 where the 
proposed method is perfectly able to recover the textured areas. We considered in this 
example the joint histograms of gradient norms on the 3 color channels. Note that the 
complexity of the algorithm is the same as for color features, as long as we use the same 
number of clusters to quantize the feature space. 


6. Conclusion and future work 

Several formulations have been proposed in this work to incorporate transport-based 
distances in convex variational model for image processing, using either regularization 
of the optimal-transport or not. 

Different perspectives have yet to be investigated, such as the final thresholding oper¬ 
ation, the use of capacity transport constraint relaxation [ 6 ], of other statistical features, 
of inertial and pre-conditionned optimization algorithms [9], and the extension to region- 
based segmentation and to multi-phase segmentation problem. 
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£ = 0.1 £ = 0.2 £ = 0.9 


Figure 1: Illustration of the image segmentation method and the influence of the final 
threshold parameter £ on the segmentation result (here using non regularized 
optimal transport, i.e. A = oo). The user defines scribbles which indicates 
the object to be segmented (here in red) and the background to be discarded 
(in green). The output image u is a regularized weight map that gives the 
probability of a pixel to belong to the object. This probability map u is finally 
thresholded with a parameter £ to segment the input image into a region Rt(u ), 
which contour is displayed in red. In the rest of the paper, £ = 0.5 is always 
used, but other strategies may be defined, such as selecting the threshold value 
that minimizes the non-relaxed energy (1). 



Input L\ OT (A oo) 


Figure 2: Robustness of OT with respect to L\\ the blue colors that are not in the 
reference histograms are considered as background with OT distance and as 
foreground with the L\ model, since no color transport is taken into account. 
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6 Conclusion and future work 



Figure 3: Comparison of the segmentation results obtained from the proposed segmen¬ 
tation models (using MK^ distances) together with the L\ distance used in 
[17]. The same regularization parameter p is used for every segmentations. 
Note that the optimal transport similarity measure is a more robust statistical 
metric between histograms than L\. 



Figure 4: Comparison of segmentations obtained from the proposed models. The input 
areas are used to compute the reference color distributions a and b. The non- 
regularized model corresponds to A = +oo, increasing regularization effects 
are then shown. 
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Input histogram 


Resulting segmentation 




Segmentation on a different image with the same exemplar histogram 

Figure 5: Illustration of the interest of optimal transport for the comparison of his¬ 
tograms. Its robustness makes it possible to use prior histograms from different 
images (here histograms are taken from image 1 and used to segment images 
2 and 3), even with a different clustering of feature space. Note that it is not 
possible with bin-to-bin metric, which requires the same clustering. 






Input 


A = oc 


Figure 6: Texture segmentation using joint histograms of color gradient norms. 
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A. Proofs 

A.l. Proof of Corollary 1 

Proof. Let us consider the problem: 

MKa,< cv(a, b ) 

= min (P,C)l4 y(P,log(P/A0> 

PeV(a,b ) A N ' 

= minmax j(P, log P / N + c) + { u ,a- P1 M „) + {v,b- P T l Ma )+w(N - I^PImJ j (26) 

= max j(u, a) + (v,b) + Nw + min(P, log P / N + C - ul^ b - 1 Ma v T - w1m 0 xm„)| 

w^. 0 


where 1 M a xM b = 1 M a ^M b is a matrix full of one, and because (u, PlM b ) = Ynt\ iH p i,j 

Eij P >PM = (P,ul T Mb ) and (v, P T l Ma ) = {P, 1 M a v T ). 

We then compute the first partial derivative of the Lagrangian 

C{P,u,v,w ) := (P, log P — log N + A(C — ulj^ b - lM a v T - wl Ma xM b ) 

dpC(P*,U,V,W ) = 0= log P* — log IV + A (C — ulM b - 1 M a V T — wl M„xmJ + 1 M a xM b 
so that: 


log P*j = -1 + log iV - A (Cij -Ui- Vj - w) 

p*. _ g-l+log N+\w e -X(Cij-Ui-Vj ) 
't'lJ 


(27) 


Replacing this result back in the equation, we get: 


MK x^N(a,b) = max max (u,a) + (v,b) + Nw + T (P*, -1 m.xm,) 

I A 


N 

= max (iz, a) + (v, 6 ) + max TV?/;-—e 

. 1 o A 


-l+Aty 


e -A (Cij-ui-vj) 


( 28 ) 


Case 1: re < 0 Let us first consider the case where the constraint is saturated, that 
is when w < 0. When computing the maximum of /( w) = Nw — ^■e~ l+ ‘ Kw K which is 
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concave < 0), we obtain that e 1+A ™* — X — e Ui so that 

the constraint YU j Pfj — N is checked. Then, we have: 


f(w*) = Nw* - ^ = Na w* - 1) = ^loge^* -1 = -^log I V e - A ( Ci ,r«i-»j) 

A A ^ a I z — ^ 


A 


A 




We finally get 


MKy^jv(a, h) = max 


TV 


(u, a) + ( V , b) - —log ^2 


z> A {pi,j Ui Vj ) 


(29) 


h3 


which means that MK \^n{u,v) = ^ylog e x ( Ci J Ui j . As these functions 

are convex proper and lower semi-continuous, we have that which 

concludes the proof. 


Case 2: w = 0 Now we consider the case where the constraint is not saturated. Going 
back to relation (28), the unconstrained optimal value ic* is still given by 


e Xw * = 


E p 1 A (Ci,j 'U'i 'Vj') 
hi 


which involves 


ic* = — — log 
A 


E' 


hJ 


, 1 A {Ci^j Ui Uj) 


Hence, w* > 0 as soon as V,; j e 1 x ^ CiJ Ui Vj - 1 < 1. In this case, the optimal value is 
therefore projected to w* = 0. Relation (28) give us the following expression: 


max 

ueR M a,v&R M b 


(u,p) + (v,q) 


N 

X 


E' 


hJ 


,-1-A (Cij—Ui—Vj) 


which concludes the proof. 


□ 


A.2. Proof of proposition 2 

Proof. The derivative VMK^(I) with X = (u,v) is lipschitz continuous iff there exist 
Lmk* > 0 such that 


||VMK^(X) - VMK^(X')|| < L mk *||^-^ / ||- 

We denote as U\ the set of vectors X — (u.v) such that V- • e~ 1 ~ x( ^ Ci o~ Ui ~ v ^ ^ 1 and 

V 7 Z - J ^i3 

Vi the set V- • e -1_ x i Ci 4^ Ui ~ v f) ^ 1. We will detail three cases. 

z —' L ,j 
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Case 1 We first consider X, X' E U\. As VMK^(I) is derivable in the set [/]_, it is a 
lipschitz function iff the norm of the Hessian matrix T~L of MKj(u, v ) is bounded. Being 
{Ah}^° 'i~ Mb the eigenvalues of H, its l 2 norm is defined as \\H\\ = max^ Moreover, as 
MK* X is convex, we know that all its eigenvalues are non negative. Thus, we have that 
the norm of H is bounded by its trace: || T~L\\ ^ Tr(T~L ) = Yi Ah — Yi 
The Hessian matrix T~L of MK^(x, y ) is defined as: 


U = 


w 11 

_(h 12 ) t 


^ i2 i 

^ 22 ’ 


with ft 11 = ViViMK^, ft 12 = V 2 ViMK^ and U 22 = V 2 V 2 MKJ. One can first 
observe that 


(ViMK^(«, v))i = a„,MK^( V ) = NP*(ui,v)l Mb = N 


Yi e~ x ( Ci ’ l ~ Ui - v ri 

Y^k i e~ x ^ kjl ~ u k~ Vl ^ 


Hence the diagonal element of the matrix are 7 -L\l — \{u, v ) and H?? = ^.MKj(x, y ). 

They read: 


nil = xn 


n 22 = xn 


X'' e ~MCij-Ui-vi) e —A (Ck,i—u k 


,l-u k -vi) 


Ek^~ x(Ck ' l ~ Uk ~ vl) ) 

J2 k e~ x( ' Ck ’i~ u k~ v ^J2k l^j e~ x( ' Ck ’ l ~ u k~ v ^ 

Y^- x{cn ~ vi) ) 2 


(30) 


Computing the trace of the matrix 7A, we have: 

lift|| < Tr(n) = E W «+Eft?i < 2AiV - 

* J 

Case 2 We now consider X,X' E Vi. In this case, we have for X = (i/, v): 

MKJ(«, n) = 

h3 

As the double derivative w.r.t is (u, v) = X Yj e 1 X(yCi ' j Ui Vj \ the trace of 

the Hessian matrix reads: 

Tr(U ) = A EE e -l-A (Cij-Ui-vj) _|_ EE e_1 — HCij-Ui-vj ) J ^ 2XN, 

\ i 3 3 i ) 


since (u,v) € V\. 


- 18 - 



Convex Image Segmentation with Optimal Transport 


J. Rabin and N. Papadakis 


Case 3 We consider X G U\ and X' G V\. We denote asFa vector that lies in the 
segment [X]X'] and belongs to the boundary of U\ and V±. We thus have \\X — Y\\ + 
\\X'-Y\\ = \\X-X'\\ so that 

|| VMK^(X) - VMK^(X')|| 

< ||VMK^(X) - VMK^(y)|| + ||VMK^(X') - VMK*(y)|| (31) 

< 2A7V(||X - y|| + \\X' - yII) = 2XN\\\X - X'W 


□ 


A.3. Proof of proposition 3 

Proof. We are interested in the proximity operator of f*(q) = y (e A ( g_c ) _1 , 1). First 
notice that the proximity operator of /* can be computed easily from the proximity 
operator of / through Moreau’s identity: 

pro x T /(p) =P~ rproxj* / t ( p/ r ). 

We now recall that the Lambert function W is defined as: 

z = we w w = W(z) 

where w can take two real values for z g] — y 0], and only one on ]0, oo[, as illustrated 
in Figure 7. As z will always be positive in the following, we do not consider complex 
values. 



Figure 7: Lambert function W(z). 










References 


We recall that the proximity operator of /* at point p reads: 


/ s . \\q-p\\ , f * ( \ 

prox r/ * (p) = argmm-—-+ f (q) 

Q 


— argmm 


in ^ 


2 T 

\\9k-Pk\\*N 


2 T 


+ _y e %-^H = g * 


(32) 


This problem is separable and can be done independently Vfc G [1, M a M fe ]. Deriving the 
previous relation with respect to q k , the optimality condition reads: 

4 - Pk + rNe^k-^)- 1 = 0 
— Qk) e ~ Xq ^ = TNe~ Xck ~ 1 
<^X (p k — ql)e^ Pk ~ q ^ = \TNe^ Pk ~ Ck ^~ l . 

Using Lambert function, we get: 

KPk - Qk) = WiXrNe^-^- 1 ) 

^ql=P k -\w{XrNe X ^-^- 1 ). 

As f*/r is convex, the prox operator is univalued, then only one possible value of the 
Lambert function is admissible, which is indeed the case since z is strictly positive). The 
proximity operator of rf* thus reads 

prox T j*(p) = p — . (35) 

which is in agreement with [2] (Chapter 10, page 190, property xii). □ 


(33) 


(34) 
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